Skip to content

feat(AGX1-275): per-RPC task permission rewire and 404/403 wrap#249

Open
asherfink wants to merge 1 commit into
mainfrom
asher.fink/agx1-275-task-route-migration
Open

feat(AGX1-275): per-RPC task permission rewire and 404/403 wrap#249
asherfink wants to merge 1 commit into
mainfrom
asher.fink/agx1-275-task-route-migration

Conversation

@asherfink
Copy link
Copy Markdown

@asherfink asherfink commented May 26, 2026

Related work

Parent epic: AGX1-264 — per-task FGAC. Follow-ups bundled in AGX1-291.

This change is part of a 5-PR stack across 3 repos. Merge order: scaleapi/scaleapi#144783 (release sgp-authz 0.7.1) → scaleapi/agentex#353 → scaleapi/agentex#356 → #246 → this PR.

Repo PR Purpose
scaleapi/scaleapi scaleapi/scaleapi#144783 sgp-authz 0.7.1 — Action.CANCEL
scaleapi/agentex scaleapi/agentex#353 agentex-auth per-account routing + cancel op
scaleapi/agentex scaleapi/agentex#356 agentex-auth register_resource API + cancel cleanup
scaleapi/scale-agentex #246 task creator audit columns + FGAC dual-write + flag
scaleapi/scale-agentex #249 (this PR) per-RPC operation rewire + 404/403 wrap

Last in the stack — this is the route-side change that actually exercises the permissions written by #246.

Summary

  • Routes each task-resource RPC to the correct AuthorizedOperationType instead of using execute everywhere: MESSAGE_SEND / EVENT_SENDupdate, TASK_CANCELcancel, TASK_CREATE stays create.
  • Broadens the execute → update swap across messages.py, checkpoints.py, states.py so the editor role can send messages and update checkpoints/state without needing owner. The task SpiceDB schema defines permission update = (editor + owner) & internal_tenant_gate, so leaving these on execute (owner-only) would lock editors out of routine mutations.
  • Collapses every task-resource denial into 404 (via ItemDoesNotExist) across all surfaces — path id, query id, body id, and name routes — so callers can no longer distinguish "task present in another tenant" from "task absent" by comparing 403 vs 404.
  • Extracts the collapse helper to src/utils/task_authorization.py, reused from both the FastAPI dep factories and the RPC authorize hook.

What changed

  • src/utils/task_authorization.py (new): _check_task_or_collapse_to_404(authorization, task_id, operation) — the shared wrap.
  • src/utils/authorization_shortcuts.py: DAuthorizedId / DAuthorizedQuery / DAuthorizedBodyId route task checks through the wrap; their inner deps no longer take a task_repository (parameter was unused). DAuthorizedName now applies the wrap when resource_type == AgentexResourceType.task — previously the name surface leaked 403 vs 404 because tasks.name is globally unique, so a probe checked the whole system rather than a single tenant.
  • src/api/routes/agents.py _authorize_rpc_request: each task-resource branch routes through the wrap. The MESSAGE_SEND block with task_name is restructured to a try/else shape so a denied-update on an existing task surfaces as 404 (it must NOT fall through to the create-fallback except — that would silently promote a denied-update into a create check, which is a privilege escalation footgun).
  • TASK_CREATE and the wildcard task("*") checks intentionally untouched.

AGX1-275 list deliverable — already covered

The AGX1-275 list-filtering deliverable (only return tasks the caller can read) is already met by existing infrastructure that this PR does not change:

  • DAuthorizedResourceIds(AgentexResourceType.task) at src/utils/authorization_shortcuts.py:130 resolves the per-principal accessible task id set.
  • It is wired into the list route at src/api/routes/tasks.py:82, which intersects user filters with the accessible set before hitting the repository.

No additional code is needed here — the list surface is already authorized end-to-end.

Pre-merge verification

The execute → update swap changes the operation literal that hits agentex-auth's check endpoint. The SGP gateway forwards the literal verbatim — needs confirmation that the SGP permissions backend resolves update (and cancel) against the same owner row everyone hitting these routes today already has. Otherwise every running agent's RPCs break at deploy time.

Two acceptable forms before merge:

  • Direct: read the SGP permissions schema and document file + line here.
  • Indirect: one-off integration test against the real SGP adapter that issues a task update check on an account whose only grant is owner. The wire-contract test already pins the literal; this would pin the resolution.

If SGP's task permission map doesn't include update / cancel, this PR needs to either land behind a flag (keep execute for SGP-routed accounts) or backfill the schema first.

Tests

  • tests/unit/api/test_tasks_authz.py — 17/17 pass.
    • 8 TestPerRpcOperationRouting tests (incl. MESSAGE_SEND create-fallback preserved through the restructure).
    • 2 TestCheckTaskOrCollapseTo404 tests (allow + denied-collapses-to-404).
    • 3 TestDAuthorizedBodyIdTaskWrap tests.
    • 3 TestDAuthorizedNameTaskWrap tests (denied-task → 404, allow returns name, agent path unaffected).
    • Wire-contract test pins cancel → "cancel" (mirrors agentex-auth's).
  • Ruff + ruff-format + alembic migration-safety lint clean (pre-commit hooks).

Out of scope / follow-ups (tracked in AGX1-291)

  • /agents/name/{agent_name} has the same leak shape — agent FGAC is outside AGX1-264.

Greptile Summary

This PR is the final step in the AGX1-264 per-task FGAC stack: it rewires each task-resource RPC to the correct AuthorizedOperationType (update for message/event sends, cancel for task cancel, create unchanged), extracts a shared _check_task_or_collapse_to_404 helper that converts every auth denial into a 404, and applies that helper across all four authorization-shortcut surfaces (path id, query id, body id, and name).

  • Operation rewireMESSAGE_SEND / EVENT_SENDupdate, TASK_CANCELcancel, plus the same execute → update swap on all message and checkpoint mutation endpoints so editor-role holders can perform routine mutations without needing owner.
  • 404/403 collapse — all task-resource denial paths (including the globally-unique task.name surface) now return 404, eliminating cross-tenant existence probes. The design trade-off (in-tenant users see 404 on permission-gap updates instead of 403) is documented with a tracked TODO.
  • Test coverage — 17 new unit tests verify per-RPC routing, the collapse behavior, the DAuthorizedBodyId and DAuthorizedName task wraps, and the cancel wire-contract literal.

Confidence Score: 3/5

The logic restructuring is correct, but the PR's own description leaves an explicit pre-merge action item open: confirming that SGP resolves the new update and cancel operation literals against the owner grant that all current agents hold. Without that confirmation, every in-flight agent's RPCs would silently receive 404 at deploy time.

The 404-collapse logic, the try/else privilege-escalation guard in MESSAGE_SEND, and the per-RPC operation routing are all implemented correctly and are well-tested. The risk is at the cross-repo boundary: the PR author explicitly calls out that the execute → update/cancel wire-level change requires proof that SGP's task permission schema maps both new operations to the existing owner grant before this can be safely deployed. That verification has not been documented as complete, which means the deployment could break all live agents' RPCs simultaneously.

agentex/src/api/routes/agents.py — the per-RPC authorization rewire is the change most sensitive to the unconfirmed SGP schema mapping.

Important Files Changed

Filename Overview
agentex/src/utils/task_authorization.py New helper that collapses AuthorizationError → ItemDoesNotExist (404) for all task-resource checks; logic is correct and well-documented with a TODO for the future 403/404 split. Minor: function is cross-module but uses the private _ naming convention.
agentex/src/utils/authorization_shortcuts.py Routes task-resource path/query/body/name checks through the collapse helper; adds an explicit elif AgentexResourceType.task branch to DAuthorizedId and DAuthorizedQuery that was previously missing, and the task-repository parameter removal (it was unused) is correct.
agentex/src/api/routes/agents.py Per-RPC operation rewire: MESSAGE_SEND/EVENT_SEND → update, TASK_CANCEL → cancel. The try/else restructure for MESSAGE_SEND with task_name correctly prevents denied-update from falling through to a create check. Pre-merge verification that SGP resolves update/cancel against the existing owner grant is flagged open in the PR description but not yet documented as completed.
agentex/src/api/routes/checkpoints.py Straightforward execute → update swap on both put_checkpoint and put_writes; no other logic changes.
agentex/src/api/routes/messages.py execute → update across all four message mutation endpoints (batch_create, batch_update, create, update); correct and consistent.
agentex/src/api/schemas/authorization_types.py Adds cancel to AuthorizedOperationType StrEnum; wire-format value "cancel" is locked by a unit test that mirrors the agentex-auth contract.
agentex/tests/unit/api/test_tasks_authz.py New test file with 17 tests covering per-RPC routing, 404-collapse behavior, DAuthorizedBodyId task wrap, DAuthorizedName task wrap, and the cancel wire-contract. Coverage is thorough for the changed paths.

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant R as agents.py _authorize_rpc_request
    participant H as _check_task_or_collapse_to_404
    participant A as AuthorizationService
    participant T as TaskService

    C->>R: RPC request (MESSAGE_SEND / EVENT_SEND / TASK_CANCEL)
    alt task_id provided
        R->>H: "check(task_id, update|cancel)"
        H->>A: authorization.check(task resource, operation)
        alt Allowed
            A-->>H: OK
            H-->>R: returns
        else AuthorizationError
            A-->>H: AuthorizationError
            H-->>R: raise ItemDoesNotExist (404)
        end
    else task_name provided (MESSAGE_SEND create-fallback path)
        R->>T: "get_task(name=task_name)"
        alt Task absent
            T-->>R: ItemDoesNotExist
            R->>A: "check(task(*), create)"
        else Task present
            T-->>R: existing_task
            R->>H: check(existing_task.id, update)
            H->>A: authorization.check(task resource, update)
            alt Allowed
                A-->>H: OK
                H-->>R: returns
            else AuthorizationError
                A-->>H: AuthorizationError
                H-->>R: raise ItemDoesNotExist (404, NOT create fallback)
            end
        end
    end
    R-->>C: proceed or 404
Loading

Comments Outside Diff (1)

  1. agentex/src/api/routes/agents.py, line 320-416 (link)

    P1 Unconfirmed wire-contract before execute → update/cancel goes live

    The PR description itself flags this as a required pre-merge gate: changing the operation literal from execute to update (and introducing cancel) is only safe if SGP's task permission schema already maps both operations to the owner grant that every in-flight agent holds today. If it does not, every MESSAGE_SEND, EVENT_SEND, and TASK_CANCEL RPC will receive an auth denial at deploy time — collapsing to 404 — with no obvious error signal beyond agents silently failing.

    The description lists two acceptable verification forms (schema doc reference or one-off integration test against the real adapter) but does not record the outcome of either. Since all affected agents in production would be impacted simultaneously, this confirmation should be documented before the PR is merged.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: agentex/src/api/routes/agents.py
    Line: 320-416
    
    Comment:
    **Unconfirmed wire-contract before `execute → update`/`cancel` goes live**
    
    The PR description itself flags this as a required pre-merge gate: changing the operation literal from `execute` to `update` (and introducing `cancel`) is only safe if SGP's task permission schema already maps both operations to the `owner` grant that every in-flight agent holds today. If it does not, every `MESSAGE_SEND`, `EVENT_SEND`, and `TASK_CANCEL` RPC will receive an auth denial at deploy time — collapsing to 404 — with no obvious error signal beyond agents silently failing.
    
    The description lists two acceptable verification forms (schema doc reference or one-off integration test against the real adapter) but does not record the outcome of either. Since all affected agents in production would be impacted simultaneously, this confirmation should be documented before the PR is merged.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Cursor Fix in Claude Code Fix in Codex

Fix All in Cursor Fix All in Claude Code Fix All in Codex

Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
agentex/src/api/routes/agents.py:320-416
**Unconfirmed wire-contract before `execute → update`/`cancel` goes live**

The PR description itself flags this as a required pre-merge gate: changing the operation literal from `execute` to `update` (and introducing `cancel`) is only safe if SGP's task permission schema already maps both operations to the `owner` grant that every in-flight agent holds today. If it does not, every `MESSAGE_SEND`, `EVENT_SEND`, and `TASK_CANCEL` RPC will receive an auth denial at deploy time — collapsing to 404 — with no obvious error signal beyond agents silently failing.

The description lists two acceptable verification forms (schema doc reference or one-off integration test against the real adapter) but does not record the outcome of either. Since all affected agents in production would be impacted simultaneously, this confirmation should be documented before the PR is merged.

### Issue 2 of 2
agentex/src/utils/task_authorization.py:9-13
The leading underscore signals "module-private" by Python convention, but this function is imported and called from both `agents.py` and `authorization_shortcuts.py` — it is effectively part of the package's internal API. Using `_` here will mislead linters, IDE "find all usages" tools, and future contributors about the function's intended scope. Removing the underscore prefix makes the intent explicit.

```suggestion
async def check_task_or_collapse_to_404(
    authorization,
    task_id: str,
    operation: AuthorizedOperationType,
) -> None:
```

Reviews (1): Last reviewed commit: "feat(AGX1-275): per-RPC task permission ..." | Re-trigger Greptile

dm36 added a commit that referenced this pull request May 27, 2026
…se and two-factor mutations

Mirrors AGX1-275 (PR #249) for agent_api_keys. Wires Spark AuthZ checks
into every api_key route, collapses denials to 404 (so name/id probes
can't distinguish "present in another tenant" from "absent"), and relies
on SpiceDB's transitive expansion of api_key.{update,delete} (= editor &
parent_agent->update & tenant_gate) for two-factor mutations rather than
issuing two explicit checks at the route layer.

- src/utils/agent_api_key_authorization.py (new):
  _check_api_key_or_collapse_to_404 — catches AuthorizationError, raises
  ItemDoesNotExist. Same shape as Asher's task helper.
- src/utils/authorization_shortcuts.py: DAuthorizedId routes
  AgentexResourceType.api_key through the wrap. (DAuthorizedName isn't
  used for api_keys; the name lookup is (agent_id, name, api_key_type),
  not a single globally-unique path param — the route handlers call the
  collapse helper inline instead.)
- src/api/routes/agent_api_keys.py:
  * POST: explicit agent.update on parent (no api_key resource yet).
  * GET list: DAuthorizedResourceIds + filter; None passes through.
  * GET /name/{name}: inline collapse helper.
  * GET /{id}: DAuthorizedId(api_key, read).
  * DELETE /{id}: DAuthorizedId(api_key, delete). Two-factor via SpiceDB
    schema (api_key.delete expands to parent_agent.update); no second
    route-layer check.
  * DELETE /name/{api_key_name}: inline collapse helper.
- tests/unit/api/test_agent_api_keys_authz.py (new): 12 tests, all pass.

Stacked on dhruv/agx1-272-agent-api-keys-dual-write (PR A). Does NOT
touch dual-write logic. Does NOT modify agentex-auth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@asherfink asherfink force-pushed the asher.fink/agx1-275-task-route-migration branch from b936a08 to 7841696 Compare May 27, 2026 21:15
@asherfink asherfink marked this pull request as ready for review May 27, 2026 21:59
@asherfink asherfink requested a review from a team as a code owner May 27, 2026 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant